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Abstract.  In  model-based  recognition  a  number  of  ad  hoc  techniques  are  used  to 
decide  whether  or  not  a  match  of  data  to  a  model  is  correct.  Generally  an  empirically 
determined  threshold  is  placed  on  the  fraction  of  model  features  that  must  be  matciied. 
In  this  paper  we  present  a  more  rigorous  approach  in  which  the  conditions  under  which  to 
accept  a  match  are  derived  based  on  fundamental  grounds.  We  obtain  an  expression  that 
relates  the  probability  of  a  match  occurring  at  random  to  the  fraction  of  model  features 
that  are  accounted  for  by  the  match.  This  expression  is  a  function  of  the  number  of  model 
features,  the  number  of  image  features,  and  a  bound  on  the  degree  of  sensor  noise. 

One  implication  of  our  analysis  is  that  a  proper  threshold  for  matching  must  vary 
with  the  number  of  model  and  data  features.  Thus  it  is  important  to  be  able  to  set  the 
threshold  as  a  function  of  a  particular  matching  problem,  rather  than  setting  a  single 
threshold  based  on  experimentation.  We  analyze  some  existing  recognition  systems  and 
find  that  our  method  yields  thresholds  similar  to  the  ones  that  were  determined  empirically 
for  these  systems,  providing  evidence  of  the  validity  of  the  technique.  _  p 
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1.  Introduction 

A  central  problem  in  machine  vision  is  that  of  recognizing  partially  occluded  ob¬ 
jects  from  noisy  data.  Recognition  systems  generally  search  for  a  matching  between 
elements  of  an  object  model  and  instances  of  those  elements  in  the  data,  recovering 
a  transformation  that  maps  part  of  the  model  onto  part  of  the  image.  There  are 
a  number  of  different  approaches  to  this  model-based  recognition  problem,  includ¬ 
ing  clustering  in  parameter  space  (e.g.,  Stockman  [1987],  Stockman  et  al.  [1982], 
Thompson  and  Mundy  [1987]),  searching  a  tree  of  corresponding  model  and  image 
features  (e.g.,  Grimson  [1989a,  1989b]  Grimson  and  Lozano-Perez  [1984,  1987],  Et- 
tinger  [1987,  1988],  Murray  [1987a,  1987b],  Murray  and  Cook  [1988],  Ayache  and 
Faugeras  [1986],  Faugeras  and  Hebert  [1986],  Ikeuchi  [1987]),  and  directly  searching 
for  possible  transformations  from  a  model  to  an  image  (e.g.,  Fischler  and  Bolles 
[1981],  Huttenlocher  and  Ullman  [1987,  1988])  (see  also  Chin  and  Dyer  [1986]  and 
Besl  and  Jain  [1985]  for  more  comprehensive  reviews).  These  approaches  all  share 
the  common  property  that  a  decision  is  made  about  the  presence  or  absence  of  an 
object  on  the  basis  of  geometric  evidence  acquired  from  the  sensory  input.  In  this 
paper  we  investigate  the  nature  of  this  decision  process  and  develop  a  formal  means 
for  deciding  when  a  match  should  be  accepted  as  correct. 

To  determine  what  constitutes  an  acceptable  match  of  a  model  to  an  image, 
most  recognition  systems  use  one  of  two  ad  hoc  approaches.  The  first  approach  is  to 
find  all  possible  interpretations  and  order  them  by  some  measure  of  completeness, 
such  as  the  percentage  of  the  model  accounted  for.  The  best  interpretations,  as 
defined  by  this  measure,  are  then  taken  as  correct  solutions.  Suppose  one  is  looking 
for  interpretations  in  the  data  of  a  particular  object  from  the  library  of  possible 
objects.  If  an  instance  of  a  particular  object  model  is  present  in  the  scene  and  the 
measure  of  completeness  is  well  behaved,  then  this  apprpach  will  correctly  find  the 
interpretations.  If  no  instance  of  a  particular  object  model  is  present  in  the  scene, 
the  interpretations  of  this  object  that  best  account  for  the  data  are  in  fact  incorrect. 
In  this  case,  one  must  either  accept  false  interpretations  or  there  must  be  some 
means  of  deciding  whether  or  not  the  object  is  present.  Furthermore,  this  approach 
is  computationally  expensive,  as  in  order  to  find  all  possible  interpretations  the 
entire  search  space  must  be  accounted  for. 

The  second  common  approach  is  to  again  apply  a  measure  of  completeness  to 
each  hypothesized  match,  but  to  use  this  measure  to  prematurely  terminate  the 
search  as  soon  as  an  interpretation  is  found  whose  measure  exceeds  some  threshold. 
Termination  can  be  based  strictly  on  the  completeness  of  the  current  interpretation, 
or  can  involve  examining  the  data  for  additional  confirming  or  refuting  evidence. 
Finding  additional  evidence  can  increase  the  measure  of  completeness  of  an  inter¬ 
pretation,  but  one  is  still  left  with  the  problem  of  deciding  whether  an  interpretation 
is  good  enough  to  accept  as  a  correct  match. 

Current  methods  for  deciding  whether  a  match  is  correct  are  based  on  empiri¬ 
cally  determined  thresholds.  A  more  rigorous  approach  would  be  to  derive  conditions 
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under  which  to  accept  a  match  that  are  based  on  fundamental  grounds.  In  this  paper 
we  analyze  the  problem  of  determining  what  constitutes  a  good  match  of  a  model 
to  an  image.  In  particular  we  derive  an  expression  that  relates  the  probability  of  a 
false  match  to  the  fraction  of  model  features  that  are  accounted  for  by  the  match. 
This  expression  is  a  function  of  the  number  of  model  features,  the  number  of  image 
features,  and  a  bound  on  the  degree  of  tolerable  sensor  noise.  The  derivation  results 
from  an  examination  of  the  likelihood  of  false  positives  (i.e.,  interpretations  that  are 
incorrect  but  arise  due  to  a  random  coincidence  of  events  in  the  image). 

We  then  use  this  relation  to  define  a  threshold  on  the  fraction  of  features  that 
must  be  matched  in  order  to  limit  the  probability  of  a  random  coincidence  to  some 
level.  We  analyze  some  existing  recognition  systems  ([Grimson  and  Lozano-Perez, 
1984,  1937]  [Ayache  and  Faugeias,  ioooj  and  find  that  our  technique  yields  thresh¬ 
olds  similar  to  the  ones  that  were  determined  empirically  for  these  systems.  This 
provides  experimental  evidence  of  the  validity  of  the  technique,  and  suggests  that  it 
can  be  used  profitably  to  set  thresholds  for  other  recognition  tasks  and  systems. 
Specifically,  we  address  the  following  question: 

•  Suppose  that  we  are  given  a  model  with  m  features,  a  set  of  s  data  features 
from  a  sensor,  and  bounds  cp  and  ca  on  the  positional  and  orientational  error 
in  the  data.  Further,  suppose  that  some  recognition  method  has  found  a  match 
that  accounts  for  a  fraction  /  (/  G  [0, 1])  of  the  m  model  features.  What  is 
the  relation  between  /  and  the  likelihood  6  that  such  a  match  can  occur  at 
random? 

We  use  this  relation  to  set  a  threshold  on  the  minimum  fraction  of  model  features 
that  must  be  matched,  /o,  such  that  the  likelihood  of  such  a  match  occurring  at 
random  is  small  (e.g.,  6  <  .001).  Note  that  there  is  not  necessarily  a  value  of  /0  for 
any  choice  of  ft  (in  particular  as  6  gets  very  small,  or  as  m,  s,  ep  or  ea  get  very  large 
there  may  be  no  fraction  of  model  "  -  ires  that  limits  the  probability  of  a  random 
match  to  6). 

There  are  three  basic  steps  to  tlie  technique.  First,  given  a  particular  type 
of  feature,  the  type  of  transformation  from  a  model  to  an  image,  and  a  bound 
on  the  sensor  error,  we  characterize  the  set  of  transformations  that  are  consistent 
with  a  single  pairing  of  model  and  image  features.  This  set  of  transformations  is 
described  by  a  volume  V  in  the  transformation  space  (a  d- dimensional  space  with 
one  dimension  corresponding  to  each  of  the  d  transformation  parameters). 

We  then  determine  the  probability,  Pr{e  >  /}  that  the  number  of  events  (in 
this  case  the  number  of  such  volumes)  that  intersect  at  a  common  point  in  the 
transformation  space  is  at  least  l.  This  likelihood  is  an  estimate  of  how  often  a 
match  of  l  features  will  occur  at  random.  The  probability  of  /  volumes  intersecting 
is  estimated  by  considering  the  limiting  case  of  a  statistical  occupancy  problem  as 
the  number  of  observations  and  cells  goes  to  infinity  [Feller,  1968].  This  method  is 
similar  to  that  used  for  the  analysis  of  the  generalized  Hough  transform  in  [Grimson 
and  Huttenlocher,  1989]. 


Finally,  the  probability  that  /  volumes  will  intersect  at  random  is  used  to  set  a 
threshold  on  the  minimum  fraction  of  model  features,  /o,  that  must  be  matched  in 
order  to  accept  an  interpretation.  In  particular  we  set  the  threshold  /o  such  that 
mfo  <  l ,  and  Pr{e  >  1}  is  a  tolerable  false  matching  rate  S. 

2.  The  Space  of  Transformations 

For  rigid  objects,  the  pose  of  an  object  with  respect  to  a  sensor  can  be  character¬ 
ized  by  a  transformation  from  the  model  to  the  sensor  coordinate  systems.  In  this 
paper  we  focus  on  the  case  where  this  transformation  is  a  similarity  (i.e.,  consist¬ 
ing  of  translation,  rotation,  and  scaling).  The  set  of  possible  solutions  to  a  given 
recognition  problem  can  be  viewed  as  a  transformation  space  having  one  dimension 
corresponding  to  each  parameter  of  the  transformation  from  model  to  sensor  coor¬ 
dinates.  A  point  in  this  transformation  space  defines  a  pose  of  an  object,  which 
in  turn  defines  a  possible  solution  to  the  recognition  problem.  For  example,  with 
a  two-dimensional  image  and  world,  the  transformation  space  is  four- dimensional 
(translation  in  x  and  y,  rotation  in  the  plane,  and  scaling). 

A  matching  of  a  modei  feature  with  an  image  feature  (such  as  an  edge  or  vertex) 
defines  a  range  of  possible  transformations  from  the  model  to  the  image,  that  is,  a 
volume  in  the  transformation  space.  The  size  and  shape  of  this  volume  depends  on 
the  type  of  feature  and  on  the  degree  of  accuracy  in  the  measurement  of  the  features. 
In  this  section  we  present  an  analytic  expression  for  the  size  of  this  volume.  This 
expression  is  related  to  that  developed  in  [Grimson  and  Huttenlocher,  1989]  for 
characterizing  the  range  of  feasible  transformations  when  the  transformation  space 
is  tesselated  at  some  sampling  rate.  Here,  we  determine  an  expression  for  the  volume 
of  feasible  transformations  in  a  continuous  transformation  space. 

In  this  section  we  limit  the  discussion  to  the  case  of  two-dimensional  matching 
problems  where  the  transformation  is  a"  isometry  (translation  and  rotation  without 
scaling),  and  the  features  are  linear  edge  fragments.  The  method  also  applies  to 
point  features,  as  discussed  at  the  end  of  the  section.  A  similar  analysis  holds  for 
three-dimensional  matching  problems  and  for  problems  involving  change  of  scale,  as 
described  in  the  appendix. 

Consider  the  problem  of  recognizing  a  two-dimensional  polygonal  model  from 
noisy,  occluded  data.  If  M.  is  the  model  coordinate  system,  we  let 

be  the  vector  to  the  midpoint  of  the  Jth  model  edge,  measured  in  M, 

T  j  be  the  unit  tangent  of  the  edge,  measured  in  M, 

Lj  be  the  length  of  the  edge. 

We  let  vcij,ij,lj  denote  similar  parameters  for  the  jth  data  edge,  measured  in  the 
sensor  based  coordinate  system,  X.  (Note  that  we  use  upper  case  characters  to 
distinguish  model  parameters  and  lower  case  characters  to  distinguish  sensory  data 
parameters.) 
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The  transformation  from  model  coordinates  to  sensor  coordinates  may  be  rep¬ 
resented  by 

v,  =  R$~V  \f  +  Vo 

where  V.v  if  a  vector  in  model  coordinates,  Re  is  a  rotation  matrix  corresponding 
to  an  angle  of  6,  Vo  is  a  translation  offset,  and  v„  is  the  corresponding  vector  in 
sensor  coordinates. 

We  need  to  know  what  transformations  will  map  a  model  edge  to  a  data  edge. 
First,  if  lj  >  Lj,  we  assume  that  the  two  edges  cannot  match.  Thus,  suppose  that 
lj  <  Lj.  Then  the  rotation  needed  to  align  the  two  tangents  is  given  by  the  angle 
0m  between  T j  and  tj,  and  this  define;  a  rotation  matrix  Rgm.  If  we  apply  this 
rotation  to  the  set  of  edge  points 


+  7TJ 

we  get  a  set  of  transformed  points 


7  € 


Lj_  Lj 
2  ’  2 


{ 


Re. 


M  j  +  7T  j 


7  € 


Lj 

2  ’  2  J  J  ' 


To  align  the  edges,  we  need  to  translate  these  rotated  points.  Now,  because 
lj  <  Lj,  there  are  many  transformations  that  will  cause  the  edges  to  overlap. 
Consider  one  endpoini  if  the  data  edge 


Pi  =  “i  “ 


If  this  happens  to  coincide  with  a  model  edge  endpoint, 


M,  - 


then 


mj  -  jh  =  Re. 


Mj 


Lj 


T  j 


+  V0 


so  that  the  translation  is 


Vo  =  mj  -  RemMj  +  ^L-hRtm±j 
because  RemTj  =  t j.  Similarly,  if  the  other  endpoints  align,  we  get 
V0  =  m_j  -  RemMj  -  hlzJlRe^j. 

Because  any  intermediate  position  is  also  acceptable,  the  set  of  translations  consis¬ 
tent  with  matching  model  edge  J  to  data  edge  j  is  given  by 

Lj  —  lj  Lj  -  l 


RemMj  +  ~(RemTj 


7  e 


']}• 


(i) 


2  2 

Hence,  matching  model  edge  J  to  data  edge  j  yields  a  set  of  points  in  transform 
space  V,  with  a  single  value  for  the  rotation  parameter  and  a  set  of  values  for  the 
translation,  that  correspond  to  a  line  oflength  Lj  -  lj,  with  orientation  Rg^j  in 
the  x-y  plane.  This  is  shown  in  Figure  1. 

This,  however,  ignores  the  issue  of  noise  in  the  measurements.  In  practice,  we 
may  only  know  the  position  of  the  endpoints  of  the  data  edge  to  within  some  ball 
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figure  1.  Range  of  feasible  translations,  for  fixed  8  and  with  no  position  error.  The  line 
in  the  direction  of  RT*  denotes  the  set  of  feasible  translations  for  a  given  value  of  8. 


(which  in  two  dimensions  is  just  a  circle)  of  radius  ep,  and  the  orientation  to  within 
an  angular  error  of  ea.  For  the  case  of  two  dimensional  lines,  these  error  ranges 
are  related.  Given  endpoint  variations  of  cp,  it  is  straightforward  to  show  that  the 
maximum  angular  variation  occurs  when  the  correct  line  is  tangent  to  both  circles 
of  radius  ep  about  the  two  endpoints,  and  is  given  by 


provided  l  >2ep. 

Inclusion  of  error  effects  on  position  measurements  imply  that  the  line  of  feasible 
translations,  for  a  given  rotation,  (as  given  by  equation  (1)),  must  be  expanded  to 
include  any  points  in  the  parameter  space  within  ep  of  that  line.  Further,  this 
expansion  into  a  region  must  be  repeated  for  each  value  of  9  in  [9m  -  ca,9m  +  c0j. 
Note  that  this  carves  out  a  skewed  volume  in  transform  space,  because  the  region’s 
center  and  orientation  are  functions  of  9  (see  equation  (1)).  This  observation  has 
been  carefully  analyzed  in  [Clemens,  1986).  The  volume  is  illustrated  in  Figure2. 

Thus,  given  Mj,Tj,Lj,mj,  we  will  use  the  following  conditions: 

•  If  lj  —  2ep  >  Lj,  then  there  are  no  consistent  transformations, 

•  Otherwise,  the  set  of  feasible  transformations  is  denoted  by  the  volume 

V(J  ,J)=  U  S(9J,J) 

—  ^ a  »0m4*€o) 


6 


where  an  individual  set  of  translations  is  denoted  by: 
StJ},*  n  —  1  <a  v.\i3»  i~i  ~Jj 


J,J)  =  {(«,' Vo)  37,  ItI  <  !!mj  -  ReMj  +  yTj  -  Voll  <  ep}  . 


Figure  2.  Range  of  feasible  translations,  with  error.  The  region  enclosed  in  solid  lines 
indicates  the  slice  S(0,j,J)  for  a  particular  value  of  8.  As  8  varies,  this  slice  rotates 
through  a  helical  path,  as  indicated  by  the  region  enclosed  by  a  dashed  line. 


We  can  use  this  expression  to  determine  the  size  of  the  set  of  feasible  transformations. 
Since  each  slice  S(0,j,J )  consists  of  two  hemicirrles  and  a  rectangle,  it  is  easy  to 
show  that  the  volume  of  the  region  defined  above  is  given  by 

cjj  =  2ea  [2 ep(Lj  -  lj)  +  . 

The  term  in  square  braces  corresponds  to  the  area  of  a  single  slice,  this  is  integrated 
over  a  range  of  angles,  yielding  the  2ea  term. 

For  simplicity,  we  let  the  data  edge  have  a  length 


tj  =  (1  -  ajj)Lj 

where  ajj  denotes  the  amount  of  occlusion  of  the  edge,  so  that  the  expression  for 
the  volume  becomes 

cjj  =  2e„  [2 epajjLj  +  xe£]  .  (2) 

If  we  are  dealing  with  point  features,  rather  than  extended  edges,  the  above 
result  can  be  specialized.  Here  Lj  — ►  0  so  that  equation  (2)  becomes 


Cjj  =  2fa7re‘. 


(2a) 


7 


Now,  what  is  ta  in  the  case  of  a  point  feature?  If  the  feature  is  a  vertex,  one  can  use 
the  direction  of  the  bisector  of  the  two  edges  defining  the  vertex  as  the  orientation 
of  the  vertex,  and  hence  can  bound  the  error  in  measuring  that  orientation  as  ea. 
Similarly,  if  the  vertex  is  a  curvature  extremum  or  a  point  of  curvature  inflection, 
one  can  use  the  local  tangent  of  the  curve  to  define  the  orientation,  and  ca  is  again 
defined  by  a  bound  on  measuring  this  orientation.  If  the  vertices  are  truly  isolated 
points,  then  e0  =  ir.  In  any  event,  our  analysis  provides  estimates  for  Cjj  both  for 
the  case  of  edge  features  and  for  the  case  of  vertices. 

For  the  case  of  a  rigid  two-dimensional  isometric  transformation,  we  have  char¬ 
acterized  the  volume  of  transformation  space,  Cjj  that  is  consistent  with  a  single 
data- model  pairing  This  expression  is  given  by  equation  (2)  for  edge  features 

and  equation  (2a)  for  point  features.  The  expression  is  a  function  of  the  noise  in  the 
data  measurements,  ep  and  €a,  and  in  the  case  of  edges  is  further  a  function  of  the 
amount  of  occlusion,  a>jj,  and  the  length  of  the  model  edge,  Lj.  In  the  appendix  we 
consider  adding  scaling  to  the  transformation  as  well  as  the  case  of  three-dimensional 
transformations.  We  now  turn  to  the  question  of  how  these  volumes  interact. 


3.  The  Probability  of  a  Conspiracy 

In  the  previous  section,  we  characterized  the  volume  of  transformation  space  that 
is  consistent  with  a  data-model  pairing.  If  two  such  volumes  overlap,  then  their 
intersection  defines  the  set  of  transformations  that  are  consistent  with  both  of  the 
data-model  pairings.  Thus  a  correct  match  of  a  model  to  an  image  will  lie  in  the 
intersection  of  several  volumes.  In  this  section  we  investigate  the  likelihood  that  l 
volumes  in  transformation  space  will  intersect  at  random.  Such  an  event  corresponds 
to  an  arrangement  of  image  features  that  happens  to  be  consistent,  within  error 
bounds,  with  l  of  the  model  features,  but  which  does  not  actually  correspond  to  an 
instance  of  the  object. 

The  likelihood  that  /  transformation  space  volumes  will  intersect  at  random  is 
a  function  their  number  and  size.  The  number  of  volumes  depends  on  the  number 
of  model  and  image  features.  The  size  of  each  volume  depends  on  the  amount  of 
noise  in  the  data,  the  type  of  feature,  and  for  edge  features  the  amount  of  occlusion 
of  the  edges.  In  order  to  be  confident  that  a  match  accounting  for  /  model  features 
is  correct,  we  would  like  to  choosp  l  such  that  the  likelihood  of  a  random  matching 
of  that  size  is  very  small. 

In  order  to  characterize  the  likelihood  that  several  volumes  will  intersect  at 
random  we  make  use  of  a  statistical  occupancy  model.  In  the  discrete  case,  if  r 
events  are  uniformly  randomly  distributed  across  n  buckets,  an  occupancy  model 
can  be  used  to  estimate  the  probability  that  a  given  bucket  will  contain  k  events. 
The  events  in  our  case  are  points  in  the  volumes  in  transformation  space,  and  the 
buckets  are  points  in  the  transformation  space  itself.  These  events  and  buckets  are 
continuous  rather  than  discrete,  and  thus  we  are  concerned  with  the  limiting  case 
as  n,  r  — >  oo. 
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The  volume  of  transformation  space  defined  by  each  incorrect  model  and  image 
feature  pairing  is  independent  of  the  correct  match.  Furthermore,  we  assume  that 
the  image  features  are  independent  of  one  another.  Thus  we  can  model  the  volumes 
in  transformation  space  as  independent  random  events.  The  distribution  of  these 
volumes  depends  on  the  image  features,  which  are  unknown,  so  we  assume  the 
uniform  distribution  as  an  approximation. 

While  the  volumes  in  transformation  space  can  reasonably  be  viewed  as  in¬ 
dependent  random  events,  we  are  modeling  the  probability  of  events  occurring  at 
points  in  these  volumes.  As  the  number  of  volumes,  R,  gets  large  (compared  with 
the  ratio  of  the  total  size  of  the  transformation  space  to  the  size  of  each  volume, 
V/c)  the  overall  distribution  of  points  in  the  space  is  aiso  random.  For  the  cases  ot 
interest  here  Rc  >  V,  so  the  assumption  of  independent  random  pointwise  events 
is  a  reasonable  approximation. 

An  alternative  explanation  of  the  independence  of  the  pointwise  events  in  trans¬ 
formation  space  is  the  following.  The  probability  that  a  particular  point  is  consistent 
with  a  given  data-model  pairing  is  equivalent  to  the  probability  that  the  point  lies 
within  some  neighborhood  of  the  centroid  of  the  given  volume  in  transformation 
space.  Since  the  image  features  are  assumed  to  be  independently  randomly  dis¬ 
tributed,  this  probability  is  independent  of  the  choice  of  image  feature.  Thus  in  the 
following  analysis  we  assume  that  the  events  in  transformation  space  are  randomly 
distributed,  and  use  the  uniform  distribution  as  an  approximation. 

Given  a  uniform  random  distribution  of  r  events  into  n  cells  such  that  each  of 
the  nr  placements  have  equal  probability,  the  probability  that  a  given  cell  contains 
exactly  k  events  is  given  by  the  binomial  distribution 


In  the  limit,  as  n ,  r 
approximated  by 


where  the  ratio  £  — ►  A,  the  binomial  distribution  is 


“  IT '  ' 

This  distribution  is  often  termed  the  Maxwell- Boltzmann  statistic  (for  a  standard 
reference  see  [Feller,  1968]). 

In  addition  to  the  Maxwell- Boltzmann  distribution,  another  common  distri¬ 
bution  used  in  occupancy  problems  is  the  Bose-Einstein  statistic,  which  has  an 
experimental  basis  in  particle  phjoics.  Un^er  the  Bose-Einstein  model,  for  large  r 
and  n  where  ^  —> -  A,  the  limiting  case  is  the  geometric  distribution,  where 

A* 

Pk  ~  (1  +  A)fc+i ' 

This  distribution  has  a  long  tail  as  k  — >  oo,  and  thus  predicts  large  peaks  with  a 
higher  probability  than  does  the  Maxweii- Boltzmann  model.  We  are  interested  in 
establishing  conservative  bounds  on  the  likelihood  that  a  large  number  of  volumes 
will  intersect  at  random,  thus  we  use  the  Bose-Einstein  statistic  because  it  provides 
a  higher  estimate  of  this  likelihood. 
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The  parameter  A  of  the  occupancy  model  is  the  ratio  of  the  occupied  volumes 
of  the  transformation  space  to  the  total  size  of  the  transformation  space.  From 
equations  (2^  and  (2a)  in  the  previous  section  we  know  that  each  pair  of  model  and 
image  feai  mes  defines  a  volume  of  size  cjj  in  transformation  space.  There  are  ms 
such  'o.jmes  for  m  model  features  and  s  image  features,  so  the  occupied  volume  of 
the  transformation  space  is  given  by 

s  m 

j=i  j=i 

The  total  size  of  the  transformation  space  is  just  the  product  of  the  ranges  for  the 
dimensions  of  the  space.  Each  rotational  dimension  ranges  over  the  interval  [0. 2zr], 
and  each  translational  dimension  ranges  over  [0,  D],  where  D  is  the  linear  extent  of 
the  image.  Thus  in  the  case  of  a  two  dimensional  isometry  (translation  and  rotation ) 
we  get 

r— '  s  tp  m 

t  lsj=\  cjJ 

A  ~  2k  D2 

We  can  simplify  this  to 

A  =  msc 

where  c  is  the  average  normalized  volume  size. 

In  the  case  of  two-dimensional  edge  features,  from  equation  (2)  we  obtain  the 
average  normalized  volume  size 

_  2ia  [2(paL  +  K(2p] 

C  ~  2*°2 

where  L  is  the  average  edge  length,  a  is  the  average  amount  of  occlusion  of  the 
edges  (the  average  value  of  ajj),  and  where  we  have  incorporated  the  normalizing 
term  2k D2 .  Note  that  as  expected  c  increases  as  the  noise  ca,cp  increases,  and  also 
c  increases  as  the  average  amount  of  occlusion  of  the  edges  a  increases. 

For  two-dimensional  points  features  (with  associated  orientations),  the  average 
normalized  volume  size  is  given  by 

_ 

c  -  2k  D2  ' 

Note  that  we  can  restrict  c  <  k  and  (p  <  y.  In  the  extreme  case,  this  can  lead  to 
c  >  1,  which  does  not  make  physical  sense.  We  should  really  take  the  minimum  of 
the  above  expressions  and  1,  but  in  practice  c  is  usually  much  smaller  than  this  and 
hence  we  ignore  this  special  case. 

A  particular  recognition  task  thus  defines  a  value  for  A,  based  on  the  type  of 
transformation  from  the  model  to  the  image,  the  type  of  features,  the  number  of 
model  features  m,  the  number  of  data  features  s,  and  a  bound  on  the  positional  and 
angular  error,  cp  and  ca. 

Given  a  value  for  A,  we  are  interested  in  the  probability  that  /  or  more  of  the 
volumes  intersect  at  random,  which  is  given  by 

i-j 

Pr{e  >  /}  =  1  -  Yl‘Pk- 
k= o 
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This  corresponds  to  an  arrangement  of  data  features  occurring  at  random  that  such 
that  l  of  the  model  features  can  be  matched  (within  the  error  bounds)  to  those  data 
features.  From  Pr{e  >  1}  we  can  determine  the  fraction  of  model  features,  /o,  such 
that  the  probability  of  m/o  features  being  matched  at  random  is  less  than  some 
predefined  level,  6.  This  value  is  just  the  smallest  /  such  that 

Pr{e  >  mf }  <  6 
i.e. 

/o  =  min{/  |  Pr{e  >  mf}  <  A}. 

We  then  choose  6  such  that  the  probability  of  a  false  match  is  small,  for  example 
6  =  .001. 

The  analysis  in  this  section  simply  counts  each  pairing  of  a  data  feature  with 
a  model  feature  equally.  It  is  also  possible  to  weight  the  events  by  the  amount  of 
model  accounted  for.  Below  we  consider  the  case  of  weighting  each  feature  match 
by  the  length  of  the  matched  edge. 

4.  Deriving  Formal  Thresholds 

We  have  used  an  occupancy  model  to  determine  an  expression  for  the  probability 
that  l  or  more  volumes  in  transformation  space  will  intersect  at  random.  This 
expression  is  a  function  of  the  number  of  features,  the  type  of  features,  and  bounds 
on  the  sensor  error.  The  expression  was  then  used  to  set  a  threshold,  /0,  on  the 
fraction  of  model  features  that  must  be  matched  in  order  to  limit  the  probability  of 
a  random  matching  to  some  level.  In  this  section  we  derive  a  closed-form  expression 
for  /0 . 

Under  Bose-Einstein  statistics,  we  have 

Xk 

Pk  ~  (1  +  A)i+1 

or  equivalently 


The  probability  that  there  will  be  /  or  more  events  occurring  at  a  point  is  given  by 

(-1 

Pr{e  >  1}  =  1  - 

k= o 

We  are  interested  in  finding  a  threshold  for  distinguishing  correct  from  random 
interpretations.  This  can  be  done  by  setting  the  threshold,  /o,  to  be  the  fraction 
of  model  features  such  that  l  =  m/o-  If  we  choose  a  value  6  for  the  probability 
that  there  will  be  m/o  or  more  events  occurring  at  random  (i.e.  a  limit  on  the  false 
positive  rate),  then  the  condition  on  /0  is  given  by 

m/o-l 

1  ~  Pk  ~  6- 
k= 0 
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Substituting  for  p*  yields 


'-ThtteY*' 


and  using  the  geometric  series  relations  ,  p  yields 

,  m/0 


1  - 


1 


1  +  A 


-  1 


i+T  ~  1 


<  6. 


We  can  isolate  /o  by  appropriate  algebra: 


So  > 


log  (i) 


where 


mlog  (1  +  j) 
A  =  msc. 


(3) 


The  value  of  c  depends  on  the  particular  type  of  feature  being  matched  and 
the  bounds  on  the  sensor  error.  In  the  case  of  two  dimensional  edge  fragments 
considered  above,  we  derived 


2ea  [2 epaL  +  7re|] 
2  kD2 


£a 

IT 


L 

2a-~  —  +  ir 
DD 


21 


Note  that  equation  (3)  exhibits  expected  behavior.  If  the  noise  in  the  data 
increases,  then  c  increases,  and  so  does  the  bound  on  /o-  Similarly,  as  the  amount 
of  occlusion  increases,  then  so  does  c  and  thus  the  bound  on  /0.  As  either  m  or  s 
increases  so  does  the  bound  on  /o,  and  as  6  decreases  /o  increases. 

Also  note  that  for  large  values  of  ms,  the  logarithm  in  the  denominator  can  be 
approximated  by  its  first  order  term,  and  one  gets  the  following  approximation 


Thus,  in  the  limit,  the  bound  on  the  fraction  of  the  model  is  linear  in  the  number  of 
sensory  features,  linear  in  the  average  size  of  the  volumes  in  transformation  space, 
and  varies  logarithmically  with  the  inverse  probability  of  a  false  match. 

The  expression  for  /o  in  equation  (3)  can  yield  values  that  are  greater  than  1.0, 
which  makes  no  sense  as  a  fraction  of  the  model  features.  When  /o  is  greater  than 
1.0  it  means  that  for  the  given  number  and  type  of  features,  and  the  given  bounds 
on  sensor  error,  it  is  not  possible  to  limit  the  probability  of  a  false  match  to  the 
chosen  6  (even  if  all  the  model  features  are  matched  to  some  sensor  feature). 

Thus  to  obtain  a  value  for  the  fraction  of  model  features  that  must  be  matched 
in  order  to  limit  the  probability  of  a  random  conspiracy  to  6,  we  simply  need  to  com¬ 
pute  c  for  the  particular  parameters  of  our  recognition  task,  and  then  use  equation 
(3)  to  compute  /o. 

There  are  several  possible  choices  for  6.  One  could  simply  set  6  to  be  some 
small  number,  e.g.  S  =  .001  so  that  a  false  positive  is  likely  to  arise  no  more  than 
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one  time  in  a  thousand.  One  could  also  set  8  as  a  function  of  the  scene  complexity, 
e.g.  some  multiple  of  the  inverse  of  the  total  number  of  data  model  pairings 


ms  ’ 

where  /?  is  an  arbitrarily  chosen  constant. 

A  third  possibility  is  to  set  8  so  that  the  likelihood  of  a  false  positive,  integrated 
over  the  entire  transformation  space,  is  small  (e.g.  less  than  1).  The  idea  is  to 
determine  the  appropriate  value  of  6  such  that  the  expectation  is  that  no  random 
matches  will  occur.  If  we  let  v  be  a  measure  of  the  sensitivity  of  our  system  in 
distinguishing  transformations,  then  we  could  choose  8  as 

8  =  — —?• 

2? vD2 

For  example,  we  could  set  v  to  be  a  function  of  the  noise  in  the  data  measurements, 
given  by  the  uncertainty  in  orientation  times  the  uncertainty  in  position:  (2ea)(ir(^). 
In  this  particular  case,  we  get 


/o  > 


mi°g(1  +  7h) 


(3a) 


To  illustrate  the  values  for  /o,  we  graph  representative  examples  in  Figures  3-5. 
Figure  3  displays  graphs  of  /o  as  a  function  of  s,  with  m  =  32,  c  =  .0002215  (these 
numbers  are  taken  from  the  recognition  systems  analyzed  in  section  5).  Each  graph 
is  for  a  different  value  of  8.  Note  that  as  s  gets  large,  the  graphs  become  linear,  as 
expected. 

Figure  4  displays  fQ  as  a  function  of  m  for  different  values  of  8.  Here,  s  = 
100,  c  =  .0002215.  Note  that  as  expected,  when  m  becomes  large,  /o  becomes  a 
constant  independent  of  m. 

Figure  5  displays  /0  as  a  function  of  the  sensor  error,  for  different  values  of  8. 
Here,  s  =  100,  m  =  32.  The  percentage  of  error  along  the  horizontal  axis  p  is  used 
to  define  sensing  errors  of  ea  =  px  and  ep  =  pL.  As  expected,  the  threshold  on  /o 
increases  with  increasing  error. 


Allowing  for  weighted  votes 

The  preceding  analysis  treated  each  data-model  feature  pairing  equally,  and  bounded 
the  probability  that  /  such  pairings  would  be  consistent  at  random.  Another  ap¬ 
proach  is  to  weight  the  contribution  of  each  data-model  pairing  by  some  measure. 
One  common  scheme  is  to  use  the  size  of  each  data  feature  as  a  weight.  In  the  case 
of  two  dimensional  edges,  for  example,  a  data-model  pairing  (j,  J)  would  carry  a 
weight  of  lj  (the  length  of  the  data  edge),  so  that  transformations  consistent  with 
pairings  of  long  data  edges  to  model  edges  would  be  more  highly  valued  than  those 
involving  short  data  edges. 

We  can  modify  our  preceding  analysis  to  handle  this  case  as  well.  Note  that  the 
parameter  A  essentially  measures  the  average  “vote”  at  each  point  in  the  transforma¬ 
tion  space.  Since  we  have  assumed  that  each  volume  of  transformations  consistent 
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f  as  function  of  s 
f 


Figure  3.  Graphs  on  bounds  on  threshold,  fo  is  graphed  as  a  function  of  s,  with  other 
parameters  fixed.  The  three  graphs  are  for  6  =  .0001,  .001,  .01  from  top  to  bottom  respec¬ 
tively. 


f  as  function  of  m 
f 


Figure  4.  Graphs  on  bounds  on  threshold,  fo  is  graphed  as  a  function  of  m,  with  other 
parameters  fixed.  The  three  graphs  are  for  6  =  .0001,  .001,  .01  from  top  to  bottom  respec¬ 
tively. 


with  some  data-model  feature  pairing  is  independent,  we  can  derive  the  expected 
weighted  “vote”  at  any  point  in  transformation  space.  As  one  might  expect,  due  to 
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f  as  function  of  error 
f 


Figure  5.  Graphs  on  bounds  on  threshold,  /o  is  graphed  as  a  function  of  error,  with 
other  parameters  fixed.  The  three  graphs  are  for  6  =  .0001,  .001,-01  from  top  to  bottom 
respectively.  The  percentage  of  error  along  the  horizontal  axis  p  is  used  to  define  sensing 
errors  of  ea  =  pr  and  ep  —pL. 


the  independence,  this  simply  yields 

A  =  msCc 

where  I  is  the  average  length  of  the  data  edges.  Note  that  this  is  the  average  length 
over  all  data  edges,  not  just  those  that  match  the  object. 

In  this  case  we  are  interested  in  bounds  on  /o  such  that 


mLJo—l 

1  -  Y  Pk<6. 

k=0 


Working  through  the  same  algebra  as  in  the  previous  section  leads  to  the  following 
bound: 


/o  > 


(4) 


5.  Some  Real  World  Examples 

To  demonstrate  the  utility  of  our  method,  in  this  section  we  analyze  some  working 
recognition  systems  that  utilize  a  threshold  on  the  fraction  of  model  features  which 
must  be  accounted  for  by  a  match.  We  find  that  the  analysis  predicts  thresholds 
that  are  close  to  those  that  were  determined  experimentally.  This  suggests  that 
the  technique  can  be  profitably  used  to  analytically  determine  thresholds  for  model- 
based  matching.  Because  our  analysis  shows  that  the  proper  threshold  varies  with 
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the  number  of  model  and  data  features,  it  is  important  to  be  able  to  set  the  threshold 
as  a  function  of  a  particular  matching  problem  rather  than  setting  it  once  based  on 
experimentation. 

As  a  first  example,  we  consider  the  application  of  the  interpretation  tree  method 
[Grimson  and  Lozano-P4rez,  1984,  1987;  Ettinger,  1987,  1988;  Murray,  1987a,  1987b; 
Murray  and  Cook,  1988]  to  recognizing  sets  of  two  dimensional  parts.  In  this  ap¬ 
proach,  a  tree  of  possible  matching  model  and  image  features  is  constructed.  Each 
level  of  the  tree  corresponds  to  one  of  the  image  features.  At  every  node  of  the  tree 
there  is  a  branch  corresponding  to  each  of  the  model  features,  plus  a  special  branch 
that  accounts  for  model  features  that  do  not  match  the  image.  A  path  from  the 
root  to  a  leaf  node  maps  each  image  feature  onto  some  model  feature  or  the  spe¬ 
cial  “no-  natch”  symbol.  The  tree  is  searched  by  maintaining  pairwise  conrictcimy 
among  the  nodes  along  a  path.  Consistency  is  checked  using  distance  and  angle 
relations  between  the  model  and  image  features  specified  the  nodes.  If  a  given  node 
is  inconsistent  with  any  node  along  the  path  to  the  root  then  the  subtree  below  that 
point  is  pruned  from  further  consideration. 

A  consistent  path  from  the  root  to  a  leaf  that  accounts  for  more  than  some 
fraction  of  the  model  features  is  accepted  as  a  correct  match.  This  threshold  is  chosen 
experimentally.  In  our  analysis  of  thresholds  for  the  interpretation  tree  method, 
we  use  the  parameters  for  the  objects  demonstrated  in  [Grimson  and  Lozano- Perez 
1987],  and  the  parameters  for  a  typical  scene  in  the  experimentation  described  there. 
These  values  are  substituted  into  equation  (2),  and  then  a  threshold  /o  is  computed 
using  equations  (3)  and  (3a). 

In  the  experiments  reported  in  [Grimson  and  Lozano- Perez,  1987],  the  following 
parameters  hold: 

m  =  32 
s  =  100 
L  =  23.959 
eP  =  10 
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We  have  computed  c  as  a  function  of  the  amount  of  occlusion  a,  and  then 
determined  the  corresponding  threshold  /o  on  the  fraction  of  model  features.  Note 
that  an  occlusion  of  1  represents  the  limiting  case  in  which  only  a  point  on  the  line 
is  visible.  The  results  are  given  in  Table  1.  The  first  column  of  the  table  shows  the 
values  of  /o  computed  using  equation  (3a).  Recall  that  this  method  integrates  over 
the  transformation  space  in  order  to  limit  the  expectation  of  a  randomly  occurring 
match  by  setting 


For  comparison,  the  second  and  third  columns  of  the  table  are  computed  using 
equation  (3),  with  the  probability  of  a  random  match,  6 ,  set  to  .001  and  .0001, 
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Occlusion 

/,  eqn  (3a) 

/,  with  *  =  .001 

/,  with  S  =  .0001 

0.0 

0.225 

0.173 

0.230 

0.1 

0.244 

0.188 

0.250 

0.2 

0.263 

0.202 

0.270 

0.3 

0.282 

0.217 

0.289 

0.4 

0.301 

0.231 

0.308 

0.5 

0.319 

0.245 

0.327 

0.6 

0.337 

0.259 

0.346 

0.7 

0.355 

0.273 

0.364 

0.8 

0.374 

0.287 

0.383 

0.9 

0.392 

0.301 

0.401 

1.0 

0.409 

0.315 

0.420 

Table  1.  Predicted  bounds  on  termination  threshold,  as  a  function  of  the  amount  of 
occlusion,  for  trials  of  the  RAF  system. 

respectively. 

As  expected,  the  bound  on  /  increases  as  the  amount  of  occlusion  increases. 
Note  that  this  bound  is  limited  in  scope  even  as  the  occlusion  factor  ranges  over  the 
entire  possible  range,  that  is,  even  for  occlusions  ranging  from  none  0  to  all  1,  the 
bound  on  /  only  varies  over  a  range  of  0.225  to  0.409.  It  is  interesting  to  compare 
these  results  with  empirical  observations.  Grimson  and  Lozano-Perez  report  that  in 
running  the  RAF  system  on  a  variety  of  images  of  this  type  using  thresholds  of  /  =  .4 
resulted  in  no  observed  false  positives,  while  using  thresholds  of  /  =  .25  would  often 
result  in  a  few  false  positives.  Since  on  average  the  occlusion  was  roughly  .5,  this 
observation  fits  nicely  with  the  predictions  of  Table  1,  namely  that  a  threshold  of  .4 
should  yield  no  errors,  while  a  threshold  of  .25  cannot  guarantee  such  success. 

If  we  use  the  lengths  of  the  data  features  to  weight  the  individual  feature  match¬ 
ings  then  substituting  into  equation  (4)  leads  to  the  predictions  shown  in  Table  2. 
These  values  were  computed  using  equation  (3a)  in  the  same  manner  as  the  first 
column  of  Table  1.  Again,  this  agrees  with  empirical  experience  for  the  RAF  system, 
in  which  weighted  matching  using  thresholds  of  /  =  .25  almost  always  led  to  no 
false  positives,  while  using  thresholds  of  /  =  .10  would  often  result  in  a  few  false 
positives. 

As  a  second  example,  we  consider  the  HYPER  system  of  Ayache  and  Faugeras 
[1986].  Similar  to  RAF,  HYPER  also  uses  geometric  constraints  to  find  matches  of 
data  to  models.  An  initial  match  between  a  long  data  edge  and  a  corresponding 
model  edge  is  used  to  estimate  the  transformation  from  model  coordinates  to  data 
coordinates.  This  estimate  is  then  used  to  predict  a  range  of  possible  positions 
for  unmatched  model  features,  and  the  image  is  searched  over  this  range  for  po¬ 
tential  matches.  Each  potential  match  is  evaluated  using  position  and  orientation 
constraints,  and  the  best  match  within  error  bounds  is  added  to  the  current  inter¬ 
pretation.  The  additional  model-data  match  is  used  to  refine  the  estimate  of  the 
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Occlusion 

/  with  £  =  L 

/  with  £  =  .IbL 

/  with  £  =  .5 L 

0.0 

0.119 

0.091  1 

0.062 

0.1 

0.136 

0.103 

0.071 

0.2 

0.153 

0.116 

0.079 

0.3 

0.171 

0.129 

0.088 

0.4 

0.188 

0.142 

0.097 

0.5 

0.205 

0.155 

0.105 

0.6 

0.222 

0.168 

0.114 

0.7 

0.240 

0.181 

0.123 

0.8 

0.257 

0.194 

0.131 

0.9 

0.274 

0.207 

0.140 

Table  2.  Predicted  bounds  on  termination  threshold,  as  a  function  of  the  amount  of 
occlusion,  for  trials  of  the  RAF  system.  In  this  case,  the  lengths  of  the  matched  edges  is 
used,  instead  of  just  the  number  of  matched  edges. 

transformation,  and  the  process  is  iterated. 

Although  not  all  of  the  parameters  needed  to  apply  our  analysis  are  given  in 
the  paper,  we  can  estimate  many  of  them  from  the  illustrations  provided  in  the 
article.  Given  several  estimates  for  the  error  in  the  measurements,  a  range  of  values 
for  the  threshold  /  are  listed  in  Table  3.  Object-1  and  Object-2  refer  to  the  object 
labels  used  by  Ayache  and  Faugeras.  In  these  examples,  we  use  orientational  errors 
of  ea  =  7r / 10  and  7r/15  radians  and  positional  errors  of  ep  =  3  pixels. 

In  HYPER,  a  threshold  of  .25  is  used  to  discard  false  positives,  and  Ayache  and 
Faugeras  report  the  observation  of  no  false  positives  during  a  series  of  experiments 
with  their  system.  For  the  two  objects  listed  in  Table  3,  Ayache  and  Faugeras 
report  that  their  system  found  interpretations  of  the  data  accounting  for  a  fraction 
of  .55  of  the  model  for  Object-1  and  accounting  for  a  fraction  of  .40  of  the  model  for 
Object-2.  Both  these  observations  are  in  agreement  with  the  thresholds  predicted 
in  Table  3,  for  different  estimates  of  the  data  error. 

Thus  for  two  different  recognition  systems  (RAF  and  HYPER),  using  both  weighted 
and  unweighted  matching  schemes,  we  see  that  the  technique  developed  in  this  paper 
yields  matching  thresholds  that  are  similar  to  those  determined  experimentally  by 
the  designers  of  the  systems. 


6.  Conclusion 

In  order  to  determine  what  constitutes  an  acceptable  match  of  a  model  to  an  image, 
most  recognition  systems  use  an  empirically  determined  threshold  on  the  fraction 
of  model  features  that  must  be  accounted  for.  In  this  paper  we  have  developed  a 
technique  for  analytically  determining  the  fraction  of  model  features  /o  that  must 
be  matched  in  order  to  limit  the  probability  of  a  random  conspiracy  of  the  data  to 
some  level  6.  This  fraction  fo  is  a  function  of  the  type  of  feature,  the  number  of 
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Object-1  Object-2 

Occlusion  /,  (gg=jff)  I  /.(<»  =  &)  -Mc  g  =  to)  |  />(gg  =  f?) 


0.224 

0.185 

0.168 

0.1 

0.243 

0.199 

0.181 

0.2 

0.261 

0.212 

:Rm1B 

0.195 

0.3 

C.279 

0.225 

0.262 

0.208 

0.4 

0.297 

0.238 

0.280 

0.221 

0.5 

0.315 

0.251 

0.298 

0.234 

0.6 

0.333 

0.261 

0.316 

0.247 

0.7 

0.350 

0.277 

0.335 

0.260 

0.8 

0.368 

0.289 

0.353 

0.273 

0.9 

0.386 

0.302 

0.371 

0.285 

Table  3.  Predicted  bounds  on  termination  threshold,  as  a  function  of  the  amount  of 
occlusion,  for  trials  of  the  HYPER  system.  The  first  two  columns  for  /  are  Object- 1,  the 
final  two  for  Object-2. 

model  features,  m,  the  number  of  sensor  features,  s  and  bounds  on  the  translation 
error  ep  and  the  angular  error  e0  of  the  sensor  and  feature  detector. 

Our  analysis  shows  that  the  proper  threshold  varies  with  the  number  of  model 
and  data  features.  A  threshold  that  is  appropriate  for  relatively  few  data  features 
is  not  appropriate  when  there  are  many  data  features.  Thus  it  is  important  to  be 
able  to  set  the  threshold  as  a  function  of  a  particular  matching  problem,  rather  than 
setting  a  single  threshold  based  on  some  experimentation.  The  technique  developed 
in  this  paper  provides  a  straightforward  means  of  computing  a  matching  threshold 
for  the  values  of  m  and  s  found  in  a  given  recognition  situation. 

We  have  applied  the  technique  to  two  existing  recognition  systems,  and  found 
that  the  predicted  thresholds  are  close  to  those  that  were  determined  experimentally. 
This  suggests  that  the  method  can  be  profitably  used  to  analytically  determine 
thresholds  for  model-based  matching  systems. 

Appendix:  Extending  the  method  to  other  cases 

So  far,  we  have  demonstrated  our  method  on  the  case  of  recognition  of  rigid  two- 
dimensional  objects  from  two-dimensional  edges  or  vertices.  We  can  readily  extend 
our  method  to  other  cases  as  well.  In  general,  equations  (3)  and  (4)  still  hold,  with 
the  proviso  that  c  changes  as  the  problem  changes.  First  we  consider  adding  scal¬ 
ing  to  the  two-dimensional  transformation,  and  then  we  consider  three-dimensional 
transformations. 

Objects  that  scale 

First,  we  consider  the  case  in  which  a  two-dimensional  object  is  free  to  scale  within 
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some  predefined  range.  In  this  case,  the  space  of  possible  transformations  is  four¬ 
dimensional,  having  two  dimensions  for  translation  parameters,  one  for  rotation, 
and  one  for  scale. 

In  this  case,  the  transformation  from  model  coordinates  to  sensor  coordinates 
may  be  represented  by 

v3  =  sRg\ m  +  Vo 


where  V m  is  a  vector  in  model  coordinates,  Rg  is  a  rotation  matrix  corresponding  to 
an  angle  of  6,  a  is  a  scale  factor,  Vo  is  a  translation  offset,  and  v,  is  the  corresponding 
vector  in  sensor  coordinates. 

As  in  the  earlier  case,  if  we  consider  the  conditions  on  the  transformation  so 
that  the  endpoint  of  a  data  edge  corresponds  to  the  endpoint  of  a  model  edge,  we 


find  that 


m, 


j*j  =  sRsn 


+  V0 


We  also  have  the  condition  that 


so  that 


sLj  >  (j 
li 

S~  Lj 


or  -f 've  allow  for  error  in  the  measurements,  that 

s  >  ~  2fP 

"  Lj  • 

Hence,  for  each  choice  of  s,  the  translation  is 

sLj  - 


V0  =  txij  -  sR«mMj  + 
Similarly,  if  the  other  endpoints  align,  we  get 


L  ,•  *> 

^R^Tj. 


sL  i  —  i 

V0  =  m;  -  sR9mMj - - — LRgmTj. 


Because  any  intermediate  position  is  also  acceptable,  the  set  of  translations 
consistent  with  matching  model  edge  J  to  data  edge  j  is  given  by 

sLj  Cj  sLj  £j 


s)  =  |m;  - 


sR»mMj  +  ')RgmTj 


7  € 


}■ 


(5) 


Hence,  matching  model  edge  J  to  data  edge  j  yields  a  set  of  points  in  transform 
space  V,  with  a  single  value  for  the  rotation  parameter  and  a  set  of  values  for  the 
translation,  that  correspond  to  a  line  of  length  sLj  -  ( j ,  with  orientation 
in  the  x-y  plane,  where  s  can  range  from 


lj  -  2 ep 

Lj 


to  some  predefined  maximum  s/,. 

To  determine  the  full  volume  of  transformation  space  consistent  with  a  data- 
model  feature  pairing,  we  must  allow  for  noise  in  the  measurements.  As  in  the  non- 
scaled  case,  we  can  integrate  over  a  range  of  orientations  within  ea  of  the  computed 
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one,  and  we  must  also  integrate  along  the  line  of  translations  defined  above  as  s 
varies,  ^his  implies  that  the  full  volume  is  given  by 


2ea  I  t  -i*r  (^ep  (s^j  £j)  +  X(p) 

J  S—  1  t  . 


which  reduces  to 


2t*  (**  -  l-Lr~)  H + 2£p 


This  is  normalized  by  the  total  volume  of  transformation  space 

2x D2  (sh  -  1) 

to  yield 


€«/**- 

C,  -  —  - 

ft  .  SK 


For  cases  involving  scale,  we  can  substitute  c4  in  place  of  c  in  the  earlier  analysis. 


Three  dimensional  case 


As  a  second  extension,  consider  the  problem  of  recognizing  three-dimensional  ob¬ 
jects  from  three-dimensional  edges.  In  this  case,  the  transformation  space  is  six 
dimensional,  with  three  dimensions  for  translational  components,  and  three  for  ro¬ 
tational  components.  As  in  the  previous  cases,  we  must  deduce  an  expression  for  c 
that  holds  in  this  case. 


We  begin  with  the  rotational  parameters.  Given  the  unit  tangent  vector  of  a 
model  edge  and  of  a  data  edge,  there  is  a  set  of  rotations  that  will  consistently  map 
the  model  tangent  into  the  data  tangent.  The  axis  of  rotation  that  will  accomplish 
this  lies  anywhere  in  the  great  circle  on  the  unit  sphere  equidistant  from  the  two 
tangent  vectors.  For  each  such  axis  of  rotation,  there  is  a  unique  angle  of  rotation 
that  will  effect  the  mapping.  When  we  allow  for  error,  the  data  tangent  is  only 
known  to  within  a  cone  of  radius  ea,  and  hence  the  great  circle  expands  into  a  band 
of  feasible  axes  of  rotation.  If  we  integrate  out  the  volume  of  feasible  rotations,  we 
get 

2  f  f  sin  cj)d(f>d6  =  -  cf. 

J 9=0  J 0=4  -cos-1  e„ 


To  account  for  the  translation,  we  find  that  an  analysis  similar  to  the  two 
dimensional  case  holds.  In  particular,  the  set  of  feasible  translations  is  a  cylinder  of 
radius  ep  of  length  aL,  where  a  is  the  amount  of  occlusion  of  the  edge,  capped  by 
two  hemi-spheres  of  radius  ep.  Hence,  the  overall  volume  is  given  by 


-  e l  (aLx(2p  +  |jre^  . 
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If  the  linear  range  of  values  for  each  dimension  of  translation  is  D ,  then  the  normal¬ 
ized  coefficient  in  the  three  dimensional  case  is  given  by 


c  = 


v/rr 


el(aL  +  i(tL))(it)'- 

V  D  3 \dJJ \d) 
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